fix(terminal): filter terminal query sequences from captured output#2334
fix(terminal): filter terminal query sequences from captured output#2334jpshackelford wants to merge 7 commits intomainfrom
Conversation
Filter terminal query sequences (DSR, OSC, DA, etc.) from captured PTY output before returning from terminal tool. These queries cause the terminal to respond when displayed, producing visible escape code garbage. Root cause: CLI tools like `gh` send terminal queries as part of their progress/spinner UI. When captured and displayed, the terminal processes them and responds, causing visible garbage like `^[[38;1R`. Solution: Add filter_terminal_queries() to strip query sequences while preserving legitimate formatting codes (colors, bold, etc.). Fixes: #2244 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Requires Eval Verification
Taste Rating: Code is clean and solves a real problem. However, this touches terminal output handling and needs eval verification before approval per repo guidelines.
Assessment:
- ✅ Solves real problem (visible escape code garbage)
- ✅ Simple, targeted solution (filter at source)
- ✅ Comprehensive tests with good coverage
⚠️ Touches terminal/stdout handling → flag for lightweight evals
The implementation is solid. Regex patterns are compiled, tests verify both removal and preservation, and the fix is applied at the right layer. Once evals confirm no regressions, this is ready to merge.
openhands-tools/openhands/tools/terminal/terminal/terminal_session.py
Outdated
Show resolved
Hide resolved
Coverage Report •
|
||||||||||||||||||||||||||||||
Change the OSC filter pattern from matching specific codes (10, 11, 4) to matching any OSC query (sequences ending with ;? before terminator). This is more future-proof and catches additional query types like: - OSC 12 (cursor color) - OSC 17 (highlight background) - Any other OSC queries that follow the standard format The pattern now matches: ESC ] Ps [;param] ;? TERMINATOR Where ;? indicates it's a query, not a set operation. Importantly, SET operations are preserved: - OSC 0 (window title) - OSC 8 (hyperlinks) - OSC 7 (working directory) Co-authored-by: openhands <openhands@all-hands.dev>
|
Good catch! I've updated the OSC pattern to be more general. Before: Matched only OSC codes 10, 11, 4 The new pattern: This catches all OSC queries:
While preserving SET operations:
Added 5 new tests to verify the behavior. See commit a499579. |
Adds .pr/test_real_world.py that runs an agent with the gh command to verify terminal query sequences are properly filtered. Usage: LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ uv run python .pr/test_real_world.py Co-authored-by: openhands <openhands@all-hands.dev>
|
📁 PR Artifacts Notice This PR contains a
|
Manual Testing InstructionsA real-world test script is available in How to Run# Clone and checkout the branch
git fetch origin fix/terminal-escape-filter-minimal
git checkout fix/terminal-escape-filter-minimal
# Run the test (uses All-Hands LLM proxy)
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
uv run python .pr/test_real_world.pyWhat the Test Does
Success Criteria✅ Pass if:
❌ Fail if:
Without the Fix (for comparison)To see the problem on git checkout main
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
uv run python .pr/test_real_world.pyYou should see escape code garbage like |
|
I have confirmed that this solution works. Note that there is a change in what the agent displays. Since we filter out OSC queries, the cli does not render the gh spinner. I think this an acceptable limitation. (Note that we aren't preventing the spinner from displaying, but when Why the
|
|
I think this is ready except that should probably test the CLI built against this version of the SDK to ensure that our approach here doesn't interfere with the TUI. Perhaps the best course is to open a PR that will build the CI against this branch of the SDK and recruit some users to use it for a day or two. |
This CLI build uses the software-agent-sdk branch from PR #2334 which includes the terminal escape filter fix for tools like gh, npm that use spinner/progress UI. SDK PR: OpenHands/software-agent-sdk#2334 Co-authored-by: openhands <openhands@all-hands.dev>
|
It looks like testing this in the CLI is blocked until breaking change in #2133 is dealt with in the CLI unless we rebase this fix branch on v1.11.5. |
|
This PR was merged and should address the blocker on testing with OpenHands-CLI OpenHands/OpenHands-CLI#587 |
|
@OpenHands Do a /codereview-roasted on this PR. Publish your review feedback as review on the PR, using your appropriate event in gh api. (not a comment, a review, you are allowed to review) |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
🔴 Needs improvement
This is aimed at a real bug and the implementation stays pleasantly small, but there are two correctness holes here: the fix only applies to output coming back through TerminalSession, and the filter is stateless so split escape sequences can still leak through incremental updates. Since this also touches terminal/stdout handling, I’d want lightweight eval coverage after those are addressed.
Verdict: not ready as the claimed fix for #2244 yet.
Key insight: sanitizing after the stream has already been split into per-observation chunks is too late; either the sanitization needs carry-over state, or part of the fix has to live at the actual SDK terminal boundary rather than only in the terminal tool output path.
openhands-tools/openhands/tools/terminal/terminal/terminal_session.py
Outdated
Show resolved
Hide resolved
openhands-tools/openhands/tools/terminal/utils/escape_filter.py
Outdated
Show resolved
Hide resolved
|
Final summary:
Checklist:
Because I did not change repository files, there was nothing to push to the remote branch. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @jpshackelford, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
Addresses review feedback: 1. Narrowed scope: Added documentation clarifying this fix only handles PTY-emitted queries (from commands run via terminal tool), not SDK-side queries (e.g., Rich library capability detection). SDK-side queries would need filtering at the conversation/visualizer boundary. 2. Made filter stateful: Created TerminalQueryFilter class that tracks incomplete escape sequences across chunks. This handles the case where long-running commands surface output incrementally and a query sequence may be split across multiple observations. Changes: - escape_filter.py: Added TerminalQueryFilter class with reset(), filter(), and flush() methods. filter() holds incomplete sequences at chunk end, prepends them to next chunk, ensuring split queries are detected and removed. Kept stateless filter_terminal_queries() for backward compat. - terminal_session.py: Uses TerminalQueryFilter instance, calls flush() when command completes, and reset() before next command. - Added 11 new tests for stateful filtering scenarios. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Requires Eval Verification
Code quality is solid: real bug fix with clean stateful filtering, well-tested (31 tests), all previous concerns addressed.
Eval gate: Touches terminal output handling → requires lightweight eval verification per repo policy before approval. Code is merge-ready once evals confirm no benchmark impact.
Verdict: ✅ Flagging for maintainer to run evals
There was a problem hiding this comment.
HUMAN: WDYT?
(Openhands-GPT-5.4)
The scope/rationale looks good to me — the narrower “filter at the PTY source” approach is a lot saner than the broader earlier attempt — but I wouldn’t merge it as-is.
Overall assessment: good direction, but there’s one real correctness hole in the stateful filter. Since this changes terminal-output behavior, I’d also treat it as eval-risk after fixing.
Security / correctness
-
[openhands-tools/openhands/tools/terminal/utils/escape_filter.py, Lines 76–81, 145–153] 🐛 DECRQSS queries still leak if the chunk split happens between the
ESCand\\of the ST terminator.
_INCOMPLETE_ESC_PATTERNonly buffers\x1bP[^\x1b]*$, so for a split like:- chunk 1:
"text\x1bP$qsetting\x1b" - chunk 2:
"\\more"
the first call emits
"text\x1bP$qsetting"instead of keeping the whole DCS pending, and the second call outputs the rest. I reproduced this locally; the final result was:'text\x1bP$qsetting\x1b\\more'instead of
"textmore".Suggested fix: make the incomplete-DCS handling keep the entire pending
\x1bP...sequence until a full\x1b\\arrives, including the case where the chunk ends on theESCthat starts the ST terminator. - chunk 1:
Testing
-
[tests/tools/terminal/test_escape_filter.py, Lines 246–253] 🧪 The DECRQSS coverage misses the failing split boundary.
The current test only splits before the ST terminator. Add a regression test that splits exactly at:"text\x1bP$qsetting\x1b""\\more"
That would catch the bug above and matches the “arbitrary chunk boundary” claim in the PR description.
So: good idea, not quite correct yet. After that’s fixed, I think the scoped approach in the description is reasonable.
|
[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww, could you please take a look when you have a chance? |
Summary
This is a more narrowly scoped solution to #2244 than our original change
#2245 which has more risk of unintended consequences than this PR.
This fix handles queries captured from PTY output (commands run via the terminal tool) and the only known cases of real problems. SDK-side queries (e.g., Rich library capability detection) are not addressed here and would require filtering at the conversation/visualizer boundary, but they have also not been observed in the wild and do not warrant the more complex change that it would entail.
Problem
When CLI tools like
gh,npm, or other progress-indicator tools run inside the SDK's PTY, they send terminal query sequences as part of their spinner/progress UI:\x1b[6n- DSR (Device Status Report) - cursor position query\x1b]11;?- OSC 11 - background color query\x1b[c- DA (Device Attributes) queryThese queries get captured as part of the command output. When the output is later displayed to the user's terminal, the terminal processes these queries and responds, causing visible garbage like:
How to Reproduce
gh pr list --repo OpenHands/openhandsVisual Example
Before fix:
After fix:
Root Cause Analysis
The escape codes are IN the captured PTY output stream, not generated by terminal responses to the SDK's own queries. When
gh(or similar tools) runs:ghsends\x1b[6nto query cursor position (for spinner positioning)Solution
Add
TerminalQueryFilterclass to strip terminal query sequences from captured output before returning from the terminal tool. This removes the queries at the source, so the user's terminal never sees them.Key features:
Filtered sequences:
\x1b[6n) - cursor position query\x1b]N;?) - color queries (foreground, background, palette)\x1b[c,\x1b[>c) - device attributes\x1bP$q...\x1b\\) - terminal state queriesPreserved sequences:
\x1b[31m,\x1b[0m, etc.)\x1b[H,\x1b[5A, etc.)Testing
# Run the unit tests uv run pytest tests/tools/terminal/test_escape_filter.py -vFiles Changed
openhands-tools/.../utils/escape_filter.pyTerminalQueryFilterclass for stateful filteringopenhands-tools/.../utils/__init__.pyopenhands-tools/.../terminal_session.pytests/tools/terminal/test_escape_filter.pyDesign Decisions
Stateful filtering: The filter maintains state across calls to handle escape sequences split across chunks. This is necessary because long-running commands surface output incrementally.
Filter at source: Apply filter where output is captured (terminal tool), not where it's displayed. This is simpler and more reliable.
Byte-level regex: Use compiled regex patterns on bytes for accurate escape sequence matching.
Preserve formatting: Only remove query sequences that trigger responses; keep colors and cursor movement intact.
Minimal scope: This fix targets only PTY output processing - SDK-side queries (e.g., Rich library) are out of scope and would need changes at the visualizer boundary.
Partially addresses: #2244
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:8f20826-pythonRun
All tags pushed for this build
About Multi-Architecture Support
8f20826-python) is a multi-arch manifest supporting both amd64 and arm648f20826-python-amd64) are also available if needed